AITopics | termination function

Collaborating Authors

termination function

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning Abstract Options

Matthew Riemer, Miao Liu, Gerald Tesauro

Neural Information Processing SystemsFeb-14-2026, 15:24:06 GMT

Neural Information Processing Systems http://nips.cc/

abstraction, architecture, hierarchy, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > Canada > Quebec > Montreal (0.04)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)

Add feedback

Learning Abstract Options

Matthew Riemer, Miao Liu, Gerald Tesauro

Neural Information Processing SystemsNov-20-2025, 20:08:27 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > Canada > Quebec > Montreal (0.04)

Industry:

Education (0.46)
Leisure & Entertainment > Games > Computer Games (0.30)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)

Add feedback

Enhancing Hierarchical Reinforcement Learning through Change Point Detection in Time Series

Arumugam, Hemanath, Fan, Falong, Liu, Bo

arXiv.org Artificial IntelligenceOct-30-2025

Hierarchical Reinforcement Learning (HRL) enhances the scalability of decision-making in long-horizon tasks by introducing temporal abstraction through options-policies that span multiple timesteps. Despite its theoretical appeal, the practical implementation of HRL suffers from the challenge of autonomously discovering semantically meaningful subgoals and learning optimal option termination boundaries. This paper introduces a novel architecture that integrates a self-supervised, Transformer-based Change Point Detection (CPD) module into the Option-Critic framework, enabling adaptive segmentation of state trajectories and the discovery of options. The CPD module is trained using heuristic pseudo-labels derived from intrinsic signals to infer latent shifts in environment dynamics without external supervision. These inferred change-points are leveraged in three critical ways: (i) to serve as supervisory signals for stabilizing termination function gradients, (ii) to pretrain intra-option policies via segment-wise behavioral cloning, and (iii) to enforce functional specialization through inter-option divergence penalties over CPD-defined state partitions. The overall optimization objective enhances the standard actor-critic loss using structure-aware auxiliary losses. In our framework, option discovery arises naturally as CPD-defined trajectory segments are mapped to distinct intra-option policies, enabling the agent to autonomously partition its behavior into reusable, semantically meaningful skills. Experiments on the Four-Rooms and Pinball tasks demonstrate that CPD-guided agents exhibit accelerated convergence, higher cumulative returns, and significantly improved option specialization. These findings confirm that integrating structural priors via change-point segmentation leads to more interpretable, sample-efficient, and robust hierarchical policies in complex environments.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2510.24988

Country: North America > United States > Arizona > Pima County > Tucson (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Add feedback

Hierarchical Reinforcement Learning in Multi-Goal Spatial Navigation with Autonomous Mobile Robots

Johnson, Brendon, Weitzenfeld, Alfredo

arXiv.org Artificial IntelligenceAug-20-2025

Hierarchical reinforcement learning (HRL) is hypothesized to be able to leverage the inherent hierarchy in learning tasks where traditional reinforcement learning (RL) often fails. In this research, HRL is evaluated and contrasted with traditional RL in complex robotic navigation tasks. We evaluate unique characteristics of HRL, including its ability to create sub-goals and the termination functions. We constructed a number of experiments to test: 1) the differences between RL proximal policy optimization (PPO) and HRL, 2) different ways of creating sub-goals in HRL, 3) manual vs automatic sub-goal creation in HRL, and 4) the effects of the frequency of termination on performance in HRL. These experiments highlight the advantages of HRL over RL and how it achieves these advantages.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2504.18794

Country: North America > United States (1.00)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

NAVIX: Scaling MiniGrid Environments with JAX

Pignatelli, Eduardo, Liesen, Jarek, Lange, Robert Tjarko, Lu, Chris, Castro, Pablo Samuel, Toni, Laura

arXiv.org Artificial IntelligenceJul-28-2024

As Deep Reinforcement Learning (Deep RL) research moves towards solving large-scale worlds, efficient environment simulations become crucial for rapid experimentation. However, most existing environments struggle to scale to high throughput, setting back meaningful progress. Interactions are typically computed on the CPU, limiting training speed and throughput, due to slower computation and communication overhead when distributing the task across multiple machines. Ultimately, Deep RL training is CPU-bound, and developing batched, fast, and scalable environments has become a frontier for progress. Among the most used Reinforcement Learning (RL) environments, MiniGrid is at the foundation of several studies on exploration, curriculum learning, representation learning, diversity, meta-learning, credit assignment, and language-conditioned RL, and still suffers from the limitations described above. In this work, we introduce NAVIX, a re-implementation of MiniGrid in JAX. NAVIX achieves over 200 000x speed improvements in batch mode, supporting up to 2048 agents in parallel on a single Nvidia A100 80 GB. This reduces experiment times from one week to 15 minutes, promoting faster design iterations and more scalable RL model development.

agent, implementation, navix, (13 more...)

arXiv.org Artificial Intelligence

2407.19396

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Sweden > Skåne County > Malmö (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Stay Alive with Many Options: A Reinforcement Learning Approach for Autonomous Navigation

Dukkipati, Ambedkar, Banerjee, Rajarshi, Ayyagari, Ranga Shaarad, Udaybhai, Dhaval Parmar

arXiv.org Artificial IntelligenceJan-30-2021

Hierarchical reinforcement learning approaches learn policies based on hierarchical decision structures. However, training such methods in practice may lead to poor generalization, with either sub-policies executing actions for too few time steps or devolving into a single policy altogether. In our work, we introduce an alternative approach to sequentially learn such skills without using an overarching hierarchical policy, in the context of environments in which an objective of the agent is to prolong the episode for as long as possible, or in other words `stay alive'. We demonstrate the utility of our approach in a simulated 3D navigation environment which we have built. We show that our method outperforms prior methods such as Soft Actor Critic and Soft Option Critic on our environment, as well as the Atari River Raid environment.

agent, navigation environment, termination function, (15 more...)

arXiv.org Artificial Intelligence

2102.00168

Country:

Asia > Middle East > Jordan (0.04)
Asia > India (0.04)

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment > Games > Computer Games (0.68)
Transportation > Air (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Diversity-Enriched Option-Critic

Kamat, Anand, Precup, Doina

arXiv.org Artificial IntelligenceNov-4-2020

Temporal abstraction allows reinforcement learning agents to represent knowledge and develop strategies over different temporal scales. The option-critic framework has been demonstrated to learn temporally extended actions, represented as options, end-to-end in a model-free setting. However, feasibility of option-critic remains limited due to two major challenges, multiple options adopting very similar behavior, or a shrinking set of task relevant options. These occurrences not only void the need for temporal abstraction, they also affect performance. In this paper, we tackle these problems by learning a diverse set of options. We introduce an information-theoretic intrinsic reward, which augments the task reward, as well as a novel termination objective, in order to encourage behavioral diversity in the option set. We show empirically that our proposed method is capable of learning options end-to-end on several discrete and continuous control tasks, outperforms option-critic by a wide margin. Furthermore, we show that our approach sustainably generates robust, reusable, reliable and interpretable options, in contrast to option-critic.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2011.02565

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Quebec > Montreal (0.04)
Asia > China (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Add feedback

Diverse Exploration via InfoMax Options

Kanagawa, Yuji, Kaneko, Tomoyuki

arXiv.org Artificial IntelligenceOct-6-2020

In this paper, we study the problem of autonomously discovering temporally abstracted actions, or options, for exploration in reinforcement learning. For learning diverse options suitable for exploration, we introduce the infomax termination objective defined as the mutual information between options and their corresponding state transitions. We derive a scalable optimization scheme for maximizing this objective via the termination condition of options, yielding the InfoMax Option Critic (IMOC) algorithm. Through illustrative experiments, we empirically show that IMOC learns diverse options and utilizes them for exploration. Moreover, we show that IMOC scales well to continuous control tasks.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2010.02756

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
(23 more...)

Genre: Research Report (0.64)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Learning Reusable Options for Multi-Task Reinforcement Learning

#artificialintelligenceJan-9-2020, 12:36:56 GMT

The option-critic architecture [2] is a more direct approach that learns options and a policy over options simultaneously. The option policies and their termination functions are trained using policy gradient methods, while the policy over options may be trained using any technique. One issue that often arises within this framework is that the termination functions of the learned options tend to collapse to "always terminate". In a later publication, the authors built on this work to consider the case where there is a cost associated with switching options [6]. This method resulted in the agent learning to use a single option while it was appropriate and terminate when an option switch was needed, allowing it to discover improved policies for a particular task.

learning reusable option, multi-task reinforcement learning, termination function, (2 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Learning Reusable Options for Multi-Task Reinforcement Learning

Garcia, Francisco M., Nota, Chris, Thomas, Philip S.

arXiv.org Artificial IntelligenceJan-6-2020

One of the main reasons why RL has worked so well in these applications is that we are able simulate millions of interactions with the environment in a relatively short period of time, allowing the agent to experience a large number of different situations in the environment and learn the consequences of its actions. In many real world applications, however, where the agent interacts with the physical world, it might not be easy to generate such a large number of interactions. The time and cost associated with training such systems could render RL an unfeasible approach for training in large scale. As a concrete example, consider training a large number of humanoid robots (agents) to move quickly, as in the Robocup competition [ Farchy et al., 2013 ] . Although the agents have similar dynamics, subtle variations mean that a single policy shared across all agents would not be an effective solution.

agent, probability, trajectory, (15 more...)

arXiv.org Artificial Intelligence

2001.01577

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Massachusetts > Plymouth County > Norwell (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback